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ABSTRACT 



Speech recognition is one of five main areas in the field of 
speech processing. Difficulties in speech recognition include variability in 
sound within and across speakers, in channel, in background noise, and of 
speech production. Speech recognition can be used in a variety of situations: 
to perform query operations and phone call transfers; for data entry; for 
command and control operations; and in dictation. Technical characteristics 
of speech recognition systems depend on several variables, the most important 
of which are vocabulary size, speaker dependence, speaker mode, domain 
dependence, and multiple language support. Knowledge sources are based on 
three models: set of phonemes (acoustic); word lexicon; and language. The 
objective of the speech recognition process is to determine the sequence of 
words that most probably caused the observed sequence of acoustic vectors. 
Currently, speech recognition systems can recognize a large number of words, 
recognize discrete speech, handle 70-100 words per minute, and handle several 
languages with a high recognition rate. In the future, speech recognition 
systems will be able to handle any speaker without need for training, 
continuous speech, very large vocabularies, telephone communication, and 
natural language understanding. (MSE) 
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1. Areas of Speech Processing 

There are five main areas in the field of Speech Processing: 

1) Speech Coding deals with the compression of the digital representation of 
the speech signal in order to facilitate economical transmission or storage. 

2) In Speech Synthesis , a synthetic speech signal is created from preexisting text 
with an attempt at reaching maximum intelligibility and naturalness. 

3) Using techniques for Speaker Identification , the machine identifies the 
speaker by his/her voice in order to ensure restricted access to information, 
computer, or the physical premises. 

4) In Speech Recognition , the information in a spoken message is identified so 
as to have the computer perform 'the corresponding command or transcribe 
in written form the dictated text. 

5) Finally, Spoken Language Translation deals with two-way communication 
via speech: a spoken message is identified, translated into a different language 
and this translation synthesised in speech form, in order, e.g., to enable a 
dialogue between speakers of different languages. 



2. Difficulties in Speech Recognition 

There are some well-known difficulties in the field of speech recognition, 

shown in the list below: 

- The variability of sounds (words, phrases, subword units), within a single 
speaker and across different speakers. 

- The variability of channel, depending on the characteristics of the different 
types of microphones. 

- The variability of background noise: side conversations, street noise, 
telephone rings, etc. 

- The variability of speech production, which adds spurious sounds to words 
proper (mouth clicks, hesitations, breath noise.) 



3. Main Functions of Speech Recognition 

Speech recognition can be used in a variety of situations: 

1) To perform Query operations, such as the consultation via telephone of a 
bank for account balances, the consultation of phone information lines for 
theatre schedules and the like, and also for phone call transfers. 
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2) Data entry situations may include the giving of a credit card number, 
dialing from mobile phones, filling out forms, and booking airline 
reservations. 

3) Command and Control operations in which speech recognition is im- 
portant occur when the hands and/or eyes are busy, during menu navigation 
and machine control, and while completing dark room work. 

4) Speech' recognition plays a key role in dictation when entering free text 
into a computer via speech. 



4. Technical Characteristics of Speech Recognition Systems 

The technical characteristics of speech recognition systems depend on several 
variables, the most important of which are the following: 

1) The vocabulary size can range from small (10-100 words) for simple 
commands, to medium (1000 words) for form filling, or to large (more than 
20 000) for such complex situations as dictation. 

2) Other than vocabulary size, the speaker dependence of a given system can 
vary from being trained to a specific speaker, to being adaptive to each user 
as (s)he speaks, or even speaker independent. 

3) The speaking mode varies between continuous text and isolated words, 
where pauses between words are needed for an adequate recognition. 

4) Speech recognition systems can be domain dependent , meaning they can 
only recognize a constrained syntax (e.g., a list of commands or of questions), 
or independent, where free text can be dictated. 

5) Multiple language support is also an important characteristic. 



5. Knowledge Sources in Speech Recognition 

The knowledge sources in speech recognition are based on three different 
models: 

1) Set of Phoneme Models: 

Reference to the typical sound of a phoneme, specified by the probability 
distribution of its spectral and temporal properties. 

2) Word Lexicon: 

Represented as a sequence of the above phonemes (Acoustic Model). 

3) Language Model: 

Statistical model extracted from large corpora of texts. 
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6. Speech Recognition Process 

The objective of the speech recognition process is to determine the sequence of 
words which caused most probably the observed sequence of acoustic vectors (see 
figure 1). 
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Figure 1. Determine sequence of words which caused MOST PROBABLY 
the observed sequence of acoustic vectors 
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7. Speech Recognition Today and Tomorrow 

An example of a present-day Speech Recognition system is the IBM VoiceType 
Dictation System. Its most important characteristics are: 

- Works on a 486 SX 25 

- Recognises more than 30K different words 

- Needs a short enrollment process 

- Recognises discrete speech (with small pauses between words) 

- Able to handle 70-100 words per minute 

- Available for 6 languages 

- With a very high recognition rate (>96%) 

Tomorrow, however, research is promising much more. Speech recognition 
systems will be able to handle: 

- Any speaker, without need for training 

- Continuous speech 

- Very large vocabularies (more than 250K words) 

- With telephone capabilities 

- Including natural language understanding 

- On Personal Digital Assistants 

These systems will be used in dictation, phone mail, DB access, home 
shopping, translation, and much more. 

Most important of all, Speech will be an “enabler", i.e., existing and new 
applications will be accessible using speech. 



8. Main Players in the Field of Speech Recognition 

The main players in the field of speech recognition are the following: 

- the European Community 

- ARPA (Wall Street Journal Contest, Air Travel Info Service (ATIS)) 

- Industrial Research (IBM in dictation, AT & T for phone services, and 
many smaller companies) 

One of the continual points of discussion in the field of speech recognition 
is the relative importance of English as compared to other languages. But 
nonetheless, speech systems are developed for other major languages as well 
(e.g., French, German, Spanish). 
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